1 Introduction

Spatial aspects of your data can provide a lot of insights into the spread of a disease or the situation of an outbreak, answering questions such as:

Here we are going to learn why to use R to address all these tasks, according to the needs of applied epidemiologists.

2 Learning objectives

  1. What is geospatial analysis?

  2. What is GIS software?

  3. Merits of R as a GIS software.

3 Prerequisites

This lesson requires familiarity with basic R and {ggplot2}: if you need to brush up, have a look at our introductory course on R and data visualization.

4 What is Geospatial analysis?

Geospatial analysis is the study of data with geographic locations or coordinates–that is, data related to positions on the Earth’s surface. This branch of data analysis is essential to epidemiology.

A geospatial analysis can, for example, help us to:

As an example, here is a map of malaria prevalence predictions in The Gambia created with {ggplot2}, adapted from Moraga, 2019.

# 👉 first, get packages:

if(!require('pacman')) install.packages('pacman')
pacman::p_load_gh("wmgeolab/rgeoboundaries")
pacman::p_load(tidyverse, ggspatial, leaflet, 
               raster, stars, here)

# 👉 second, get data:

# country boundaries
gambia_boundaries <- geoboundaries(country = "Gambia", adm_lvl = 1)
# malaria prevalence
gambia_prevalence <- read_rds(here("ch06_basic_geospatial_viz",
                                   "data", "gambia_prevalence.rds"))
# 👉 third, plot data:

ggplot() +
  # with a background
  annotation_map_tile(data = gambia_boundaries, zoomin = 0) +
  # plus a prevalence surface
  geom_stars(data = st_as_stars(gambia_prevalence)) +
  # with a color scale
  scale_fill_viridis_c(na.value = "transparent", alpha = 0.75) +
  # and a coordinate system
  coord_sf()

In this chapter, you will learn the basic skills required to use R for the geospatial visualization of epidemiological data to make accurate, elegant and informative maps.

5 R as a GIS

Every geospatial analysis needs a geographic information system (GIS).

A GIS is a software application or platform for managing, analyzing, and visualizing spatial data. The most popular GIS platforms, like ArcGIS (paid) and QGIS (free), are primarily graphic-user-interface (GUI) based—that is, they work with visual point-and-click interfaces, not with code scripts.

So why use R for geospatial work? lets look at its merits:

5.1 (1/5) Reproducibility:

When you do geospatial analysis with code rather than by pointing and clicking, it is straightforward for anyone to re-run, or reproduce your analysis steps, by simply re-running your script. Similarly, you can easily build on other people’s work by This facilitates easy collaboration with your colleagues (or with your future self)!

Let’s take this code as an example. If you paste this code in any R session, you would be able to reproduce in your computer the map that I built:

# 👉 packages
if(!require('pacman')) install.packages('pacman')
pacman::p_load(sf, ggplot2)

# 👉 data 
nc <- st_read(system.file("shape/nc.shp", package = "sf"),
              quiet = TRUE)
# 👉 plot
ggplot(data = nc) + 
  geom_sf(aes(fill = SID74)) +
  scale_fill_viridis_c()

5.2 (2/5) Reporting:

Tools like {Rmarkdown}, {flexdashboard} and {shiny} make it easy to generate elegant reports and dashboards for sharing your geospatial work.

For example, we can include our previous map in an interactive dashboard using the {leaflet} package, instead of {ggplot2}:

# 👉 packages
if(!require('pacman')) install.packages('pacman')
pacman::p_load(sf, leaflet)

# 👉 data
nc <- st_read(system.file("shape/nc.shp", package = "sf"),
              quiet = TRUE)

# 👉 plot
pal <- colorNumeric("YlOrRd", domain = nc$SID74)
leaflet(nc) %>%
  addTiles() %>%
  addPolygons(color = "white", fillColor = ~ pal(SID74),
              fillOpacity = 1) %>%
  addLegend(pal = pal, values = ~SID74, opacity = 1)

5.3 (3/5) Rich ecosystem:

R has a rich and rapidly growing libraries for working with geospatial data. Because of R’s highly-active open-source community, one can usually find ready-to-use packages or tutorials for most tasks with geospatial data.

For example, if we replace the {leaflet} package with {mapview}, we can make the same previous interactive map with only one line of code!

# 👉 packages
if(!require('pacman')) install.packages('pacman')
pacman::p_load(sf, mapview)

# 👉 data
nc <- st_read(system.file("shape/nc.shp", package = "sf"),
              quiet = TRUE)

# 👉 plot
mapview(nc, zcol = "SID74")

5.4 (4/5) Convenience:

You already know R! This important merit opens the doors to copy, paste and modify any reproducible piece of code that you find.

As an example, we will use the {tmap} package and make minor modifications to it!

First, run this chunk:

# 👉 packages
if(!require('pacman')) install.packages('pacman')
pacman::p_load(tmap, spData)

# 👉 data
load(here("ch06_basic_geospatial_viz/data/nz_elev.rda"))

# 👉 plot
tm_shape(nz_elev)  +
  tm_raster(title = "elev", 
            style = "cont",
            palette = "-RdYlGn") +
  tm_shape(nz) +
  tm_borders(col = "red", 
             lwd = 3) +
  tm_scale_bar(breaks = c(0, 100, 200),
               text.size = 1) +
  tm_compass(position = c("LEFT", "center"),
             type = "rose", 
             size = 2) +
  tm_credits(text = "J. Nowosad, 2019") +
  tm_layout(main.title = "My map",
            bg.color = "lightblue",
            inner.margins = c(0, 0, 0, 0))

Now, apply any of the following suggestions to get used to how this package works:

  1. Change the map title from “My map” to “New Zealand”.
  2. Update the map credits with your own name and today’s date.
  3. Change the color palette to “BuGn”.
  4. Try other palettes from http://colorbrewer2.org/
  5. Put the north arrow in the top right corner of the map.
  6. Improve the legend title by adding the legend units.
  7. Increase the number of breaks in the scale bar.
  8. Change the borders’ color of the New Zealand’s regions to black.
  9. Decrease the line width.
  10. Change the background color to any color of your choice.

5.5 (5/5) Integrated workflow:

Finally, with R you can combine geospatial visualization and analyses with other statistical and epidemiological analyses, all within a single script.

For example, you can built a 3D maps of the Monterey Bay using the {rayshader} package. To reproduce this you can follow the tutorial available in this link: https://www.tylermw.com/3d-maps-with-rayshader/

Also, you can built fancy bivariate maps to highlight the unequal distribution of the income per country. You can follow the whole workflow for this map from this tutorial: https://timogrossenbacher.ch/2019/04/bivariate-maps-with-ggplot2-and-sf/

6 Wrap up

In this first lesson, we learned why to use R as a GIS software and take advantage of its coding environment.

But, which are going to be the first maps that we are going to built in this chapter?

Figure 1. Thematic maps: (A) Choropleth map, (B) Dot map, (C) Density map, and (D) Basemap for a dot map.

In the following lessons, we will learn how to built -step by step- different types of Thematic maps using the {ggplot2} package, with different data sources and illustrative annotations.

Figure 2. {ggplot2} map with text annotations, a scale bar and north arrow.

Contributors

The following team members contributed to this lesson:

References

Some material in this lesson was adapted from the following sources:

This work is licensed under the Creative Commons Attribution Share Alike license. Creative Commons License